Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines
نویسندگان
چکیده
Abstract Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation why a document is considered novel. In addition, dealing with at word level crucial provide more fine-grained analysis than what available level. this work, we propose Tsetlin Machine (TM)-based architecture for scoring individual words according their contribution novelty. Our approach encodes description novel documents using linguistic patterns captured by TM clauses. We then adapt measure how much contributes making experimental results demonstrate our breaks down into interpretable phrases, successfully measuring
منابع مشابه
Protein Word Detection using Text Segmentation Techniques
Literature in Molecular Biology is abundant with linguistic metaphors. There have been works in the past that attempt to draw parallels between linguistics and biology, driven by the fundamental premise that proteins have a language of their own. Since word detection is crucial to the decipherment of any unknown language, we attempt to establish a problem mapping from natural language text to p...
متن کاملText Genre Detection Using Common Word Frequencies
In this paper we present a method for detecting the text genre quickly and easily following an approach originally proposed in authorship attribution studies which uses as style markers the frequencies of occurrence of the most frequent words in a training corpus (Burrows, 1992). In contrast to this approach we use the frequencies of occurrence of the most frequent words of the entire written l...
متن کاملWord and phone level acoustic confidence scoring
This paper presents a word level confidence scoring technique based on a combination of multiple features extracted from the output of a phonetic classifier. The goal of this research was to develop a robust confidence measure based strictly on acoustic information. This research focused on methods for augmenting standard log likelihood ratio techniques with additional information to improve th...
متن کاملInterpretable support vector machines for functional data
Support Vector Machines (SVM) has been shown to be a powerful nonparametric classification technique even for high-dimensional data. Although predictive ability is important, obtaining an easy-to-interpret classifier is also crucial in many applications. Linear SVM provides a classifier based on a linear score. In the case of functional data, the coefficient function that defines such linear sc...
متن کاملGrammatical structures for word-level sentiment detection
Existing work in fine-grained sentiment analysis focuses on sentences and phrases but ignores the contribution of individual words and their grammatical connections. This is because of a lack of both (1) annotated data at the word level and (2) algorithms that can leverage syntactic information in a principled way. We address the first need by annotating articles from the information technology...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Intelligence
سال: 2022
ISSN: ['0924-669X', '1573-7497']
DOI: https://doi.org/10.1007/s10489-022-03281-1